Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

Yang, Yue; Yao, Wenlin; Zhang, Hongming; Wang, Xiaoyang; Yu, Dong; Chen, Jianshu

Computer Science > Computation and Language

arXiv:2210.12261 (cs)

[Submitted on 21 Oct 2022]

Title:Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

Authors:Yue Yang, Wenlin Yao, Hongming Zhang, Xiaoyang Wang, Dong Yu, Jianshu Chen

View PDF

Abstract:Large-scale pretrained language models have made significant advances in solving downstream language understanding tasks. However, they generally suffer from reporting bias, the phenomenon describing the lack of explicit commonsense knowledge in written text, e.g., ''an orange is orange''. To overcome this limitation, we develop a novel approach, Z-LaVI, to endow language models with visual imagination capabilities. Specifically, we leverage two complementary types of ''imaginations'': (i) recalling existing images through retrieval and (ii) synthesizing nonexistent images via text-to-image generation. Jointly exploiting the language inputs and the imagination, a pretrained vision-language model (e.g., CLIP) eventually composes a zero-shot solution to the original language tasks. Notably, fueling language models with imagination can effectively leverage visual knowledge to solve plain language tasks. In consequence, Z-LaVI consistently improves the zero-shot performance of existing language models across a diverse set of language tasks.

Comments:	EMNLP 2022
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2210.12261 [cs.CL]
	(or arXiv:2210.12261v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.12261

Submission history

From: Yue Yang [view email]
[v1] Fri, 21 Oct 2022 21:33:10 UTC (25,637 KB)

Computer Science > Computation and Language

Title:Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators